Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

نویسندگان

  • Marianne Akian
  • Jean Cochet-Terrasson
  • Sylvie Detournay
  • Stéphane Gaubert
چکیده

We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop here a policy iteration algorithm for zero-sum stochastic games with mean payoff, following an idea of two of the authors (Cochet-Terrasson and Gaubert, C. R. Math. Acad. Sci. Paris, 2006). The algorithm relies on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which is analogous to the notion of reduction of super-harmonic functions in linear potential theory. To avoid cycling, at each degenerate iteration (in which the mean payoff vector is not improved), the new relative value is obtained by reducing the earlier one. We show that the sequence of values and relative values satisfies a lexicographical monotonicity property, which implies that the algorithm does terminate. We illustrate the algorithm by a mean-payoff version of Richman games (stochastic tug-of-war or discrete infinity Laplacian type equation), in which degenerate iterations are frequent. We report numerical experiments on large scale instances, arising from the latter games, as well as from monotone discretizations of a mean-payoff pursuit-evasion deterministic differential game. 2010 Mathematics Subject Classification: 91A20; 31C45; 47H09; 91A15; 91A43; 90C40

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Convex Programming-based Algorithm for Mean Payoff Stochastic Games with Perfect Information

We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G = (V,E), with local rewards r : E → Z, and three types of positions: black VB , white VW , and random VR forming a partition of V . It is a long-standing open question whether a polynomial time algorithm for BWR-games exists, even when |VR| = 0. In fact, a pseudo-polynomial ...

متن کامل

On the computational complexity of solving stochastic mean-payoff games

We consider some well known families of two-player, zero-sum, turn-based, perfect information games that can be viewed as specical cases of Shapley’s stochastic games. We show that the following tasks are polynomial time equivalent: • Solving simple stochastic games, • solving stochastic mean-payoff games with rewards and probabilities given in unary, and • solving stochastic mean-payoff games ...

متن کامل

The Complexity of Solving Stochastic Games on Graphs

We consider some well-known families of two-player zero-sum perfect-information stochastic games played on finite directed graphs. Generalizing and unifying results of Liggett and Lippman, Zwick and Paterson, and Chatterjee and Henzinger, we show that the following tasks are polynomial-time (Turing) equivalent. – Solving stochastic parity games, – Solving simple stochastic games, – Solving stoc...

متن کامل

Finite-step Algorithms for Single-controller and Perfect Information Stochastic Games

After a brief survey of iterative algorithms for general stochastic games, we concentrate on finite-step algorithms for two special classes of stochastic games. They are Single-Controller Stochastic Games and Perfect Information Stochastic Games. In the case of single-controller games, the transition probabilities depend on the actions of the same player in all states. In perfect information st...

متن کامل

Canonical forms of two-person zero-sum limit average payoff stochastic games

We consider two-person zero-sum stochastic games with perfect information and, for each k ∈ Z+, introduce a new payoff function, called the k-total reward. For k = 0 and 1 they are the so called mean and total rewards, respectively. For all k, we prove solvability of the considered games in pure stationary strategies, and show that the uniformly optimal strategies for the discounted mean payoff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1208.0446  شماره 

صفحات  -

تاریخ انتشار 2012